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FEAST EIGENSOLVER FOR NON-HERMITIAN PROBLEMS 

JAMES KESTYN*, ERIC POLIZZlt, AND PING TAK PETER TANG* 


Abstract. A detailed new upgrade of the FEAST eigensolver targeting non-Hermitian eigenvalue 
problems is presented and thoroughly discussed. It aims at broadening the class of eigenproblems 
that can be addressed within the framework of the FEAST algorithm. The algorithm is ideally suited 
for computing selected interior eigenvalues and their associated right/left bi-orthogonal eigenvectors, 
located within a subset of the complex plane. It combines subspace iteration with efficient contour 
integration techniques that approximate the left and right spectral projectors. We discuss the various 
algorithmic choices that have been made to improve the stability and usability of the new non- 
Hermitian eigensolver. The latter retains the convergence property and multi-level parallelism of 
Hermitian FEAST, making it a valuable new software tool for the scientific community. 
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1. Introduction. The generalized eigenvalue problem AX = BX A with A and 
B square matrices and A diagonal, is a central topic in numerical linear algebra and 
arises from a broad and diverse set of disciplines in mathematics, science and engi¬ 
neering (the problem is said “standard” if B = I or “generalized” otherwise). Solving 
the interior eigenvalue problem consists of determining nontrivial solutions {A i,Xi} 
(i.e. eigenpairs with Xi = Xa and A i = A*^) located anywhere inside the spectrum. 
Most common numerical applications lead to symmetric eigenvalue problems where A 
is real symmetric or complex Hermitian, B is symmetric or Hermitian positive definite 
(hpd), and all the obtained eigenvalues \ are real. Non-symmetric and non-Hermitian 
eigenvalue problem (including the case where A is complex symmetric) can also be 
encountered in a variety of situations resulting in complex values for A,;. In this case 
Xi is called the right eigenvector associated with A i, while one can also define a left 
eigenvector Xi = Xei solution of X H A = A X H B (i.e. A H X = B H X A*). Although 
many software packages are available for symmetric (or Hermitian) matrices (see e.g. 
[17, 23, 33, 18, 25, 5, 36, 20]), relatively few algorithms and software can handle the 
non-Hermitian problem [22, 4, 23, 3, 13]. The FEAST eigensolver [29, 10], proven to 
be a robust and efficient tool for computing the partial eigenspectrum of Hermitian 
system matrices [38], can also be generalized and applied to arbitrary non-Hermitian 
systems [21, 41, 37]. 

FEAST is a subspace iteration method that uses the Rayleigh-Ritz projection 
and an approximate spectral projector as a filter [38]. Given a Hermitian generalized 
eigenvalue problem AX = BX A of size n, the algorithm in Figure 1 outlines the 
main steps of a generic Rayleigh-Ritz subspace iteration procedure for computing m 
eigenpairs. At convergence, the algorithm yields the B-orthonormal eigensubspace 
Y m = X m = {x 1 ,x 2 ,...,x m } nxm and associated eigenvalues A Qrn = A m . Taking 
p(B~ 1 A) = B _1 A, yields the bare-bone subspace iteration (generalization of the 
power method) which converges towards the m dominant eigenvectors with the linear 
rate |A mo+ i/Ai|[31, 32, 28]. This standard approach is never used in practice. 

* Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 
01003, USA, kestyn@ecs.umass.edu. 

t Department of Electrical and Computer Engineering, Department of Mathematics and Statistics, 
University of Massachusetts, Amherst, MA 01003 USA, polizzi@ecs.umass.edu. 

JIntel Corporation, Santa Clara CA 95054 USA, peter.tang@intel.com 


1 



2 


J. KESTYN, E. POLIZZI, AND P. TANG 


0. Start: Select random subspace lm 0 = {2/1) 2/2, 2/m 0 }nxm 0 {n » mo > m) 

1. Repeat until convergence 

2. Compute Q mo = p(B~ 1 A)Y rno 

3. Orthogonalize Q mo 

4. Compute Aq = Q^ o AQ mo and Bq = Q^ 0 BQ ma 

5. Solve A q W = BqWAq with W H BqW = I moXmo 

6. Compute Y mo = Q m „W 

7. Check convergence of Y mo and A Q mg for the nn wanted eigenvalues 

8. End 


Fig. 1: Subspace iteration method with Rayleigh-Ritz projection 


Instead, it is combined with filtering using the function p which aims at improving 
the convergence rate (i.e. \p(\ mo +\) / p(\i)\i=i,..., m ) by increasing the gap between 
wanted and unwanted eigenvalues. The filtering function can also be expressed using 
the spectral decomposition of the Hermitian problem while considering the entire 
B-orthonormal eigensubspace i.e. X H BX = I. 

(1.1) = Xp{ A)X~ 1 = Xp(A)X H B. 

An ideal filter for the interior eigenvalue problem which maps all m wanted eigenvalues 
to one and all unwanted ones to zero, can be derived from the Cauchy (or Dunford) 
integral formula: 

(1.2) p(X) = £ dz{z- A) -1 , 

where the wanted eigenvalues are located inside a complex contour C. The filter then 
becomes a spectral projector, with p(B~ 1 A) = X m X^B, for the eigenvector subspace 
X m (i.e. p{B~ l A)X m = X m ) and can be written as: 

(1.3) p{B~ 1 A) = —(f dz(zB — A)~ 1 B. 

2m J c 

The FEAST method proposed in [38, 29], uses a numerical quadrature to approxi¬ 
mately compute the action of this filter onto a set of mo vectors along the subspace 
iterations. The resulting rational function p a that approximates the filter (1.2) is 
given by: 

n e 

( 1 - 4 ) p a {z) = Y2 ~^—, 

“ Zj — 2 
j=i J 

where {zj, ^j}i<j<n e are the nodes and related weights of the quadrature. We obtain 
for the subspace Q mo in step 2 of the algorithm in Figure 1: 

n e 

(1.5) Q mo = p a (B~ 1 A)Y mo = Y'UjfaB - A)~ 1 BY mo = X p a (A)X H BY mo . 

i =1 

In practice, Qm 0 can be computed by solving a small number of (independent) shifted 
linear systems over a complex contour. 

n e 

(1.6) Qmo ='^2 U} jQ™ 0 ’ with Qm 0 solution of (zjB - A)Q^ o = BY mo 
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The original FEAST paper [29] demonstrated the effectiveness of the approach 
without analysis of convergence or numerical issues. A detailed numerical analysis 
on FEAST was completed recently in [38], placing the algorithm on a more solid 
theoretical foundation. In particular, a relatively small number of quadrature nodes 
(using Gauss, Trapezoidal or Zolotarev [16] rules) on a circular contour suffices to 
produce a rapid decay of the function p a from « 1 within the search contour to ~ 0 
outside. In comparison with more standard polynomial filtering [35, 32], the rational 
filter (1.4) can lead to a very fast convergence of the subspace iteration procedure. In 
addition, all the to desired eigenvalues are expected to converge at the same rate (since 
p a {Xi) — 1 if A i is located within the search interval). The convergence rate of FEAST 
does not only depend upon the decay properties of the rational function p a , but also 
on the size of the search subspace to o which must not be chosen smaller than the 
number of eigenvalues inside the search contour (i.e. too > to). Users of the FEAST 
eigensolver are responsible for specifying an interval to search for the eigenvalues and 
a subspace size too that overestimate the number of the wanted eigenvalues. Once 
these conditions are satisfied, FEAST offers the following set of appealing features: 

(i) high robustness with well-defined convergence rate |pa(A mo +i)/p a (Ai)|i=i l ... jm ; 

(ii) all multiplicities naturally captured; 

(iii) no explicit orthogonalization procedure on long vectors required in practice 
(i.e., step-3 in Figure 1 is unnecessary as long as Bq is positive definite). We 
note in (1.5) that Q mo is naturally spanned by the eigenvector subspace; 

(iv) reusable subspace capable to generate suitable initial guess when solving a 
series of eigenvalue problems; 

(v) can exploit natural parallelism at three different levels: search intervals can 
be treated separately (no overlap) while maintaining orthogonality - linear 
systems can be solved independently across the quadrature nodes of the com¬ 
plex contour - each complex linear system with too multiple right-hand-sides 
can be solved in parallel. Consequently, within a parallel environment, the 
algorithm complexity depends on solving a single linear system using a direct 
or an iterative method. 

By allowing the search contour to be placed at arbitrary locations in the complex 
plane, the FEAST algorithm can be naturally extended to non-Hermitian problems 
which produce complex eigenvalues. The algorithm retains most of the properties 
of Hermitian FEAST including the multi-level parallelism. We note, however, a few 
theoretical and practical difficulties arising which distinguish the non-Hermitian eigen¬ 
value problems from Hermitian ones, including: (i) the treatment of defective systems 
using the Schur or Jordan forms; (ii) the notion of bi-orthogonality for dual right and 
left eigenvector subspaces; (iii) the case of ill-conditioned eigenvalue problems that 
produce sensitive eigenvalues in finite precision arithmetic; (iv) or the shift-invert 
strategy that may give rise to ill-conditioned linear systems (e.g. if a FEAST quadra¬ 
ture pole lies near a complex eigenvalue). 

The key point at which the non-Hermitian FEAST algorithm differs from the 
Hermitian one is the use of dual subspaces. Since the left and right eigenvectors 
do not necessarily lie in the same subspace, two separate projectors must then be 
calculated in order to recover both sets of vectors. A single sided algorithm where 
only the right subspace is used to project is also possible [37], but will not return a 
B-bi-orthogonal subspace of left and right eigenvectors, which can be of interest for 
many applications. In the following, all quantities associated with the left eigenvectors 
will be written with a symbol (e.g. A, Y and Q). The non-Hermitian algorithm is 
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similar to its Hermitian counterpart and follows the same steps outlined in Figure 1. 
A comparison between the main numerical operations for the two algorithms is briefly 
outlined in Figure 2. The rest of the article aims at providing all the details of the 


Hermitian FEAST 


Non-Hermitian FEAST 


Solving: AXm = 

[A m ]jj G 


BXmAm 

A mini Amax] 


Solving: AXm — BXmAm [Atti]^ £ C 
A H X m = B H X m A* m 


Inputs: A = A H , B hpd; mo > m; 

{zi,...,Zn e }, {w lt ...,w ne } 


Inputs: A and B general; mo > m; 

{zi,...,z„ e }, {W!,...,w ne } 


repeat 

Qm o = 0 
for j = 1, 7l e 

Qmo 4— (ZjB — A) 

QrriQ ^ QrriQ 


L BY m 


t i 

10 j y °cm q 


end 

Bq 4- Q» 0 BQm 0 

Check Bq hpd (resizing step) 

Aq 4- Qm 0 AQm 0 


Solve AqW = BqWAq; W h B q W = I 
ira 0 4~ Qm 0 W 

until Convergence ofYm, AQ m 

with [AQ m ]jj G ^max] 


Output: X m = Y m {X^BXm = I m )\ 
Am = A Q m 


Ym 0 , Ym 0 <— rno initial vectors; 
repeat 

Qm o = Qmo = 0 

for j = 1, n e 


Qm 0 

4 — ( ZjB 

— A)- 1 BYm 0 ; 

0*4) 

Vmo 

<- (*; B 

h _ A Hy 

-1 J)H Y 
o i m o 

Qmo 

^ Qmo 

+ WjQm o 


Qmo 

4 Qmo 

+ Qmo 


end 




b q <- 

Qm 0 BQtuq 


Check 

Bq non 

-singular 

(resizing step) 

Aq <- 

Q% 0 A Q 

*m 0 


Solve . 

a q w = 

BqWAq 

and 


Aq h W = Bq h WA* q - W h B q W = I 

Ym o ^ QmQ^V, Ym,Q ^ QmQ^Y 
until Convergence ofY m ,Ym,AQ m 
with [AQ m ]u £ C 


Output: Xm = Ym\ 

Xm = Ym (X%BX, 
Am = A Qm 


Im ); 


Fig. 2: Brief outlook and comparison between the main numerical operations for the 
FEAST algorithms applied to the Hermitian and non-Hermitian problems. 


non-Hermitian FEAST algorithm and its practical implementation. Section 2 presents 
multiples theoretical and practical algorithmic considerations, outlines the differences 
with the Hermitian FEAST algorithm, and ends with a complete description of the 
non-Hermitian algorithm with discussions on limitations. Section 3 briefly outlines 
some features of the new FEAST eigensolver version 3.0, from which the proposed 
changes here take effect. We conclude by presenting some numerical experiments in 
Section 4. 

2. Theoretical and Practical Considerations. 

2.1. Defining a search contour. A key feature of FEAST is the ability to cal¬ 
culate a subset of eigenvalues that exist within some interval. Figure 3 summarizes the 
different search contour options possible for both the Hermitian and non-Hermitian 
FEAST algorithms. 

For the Hermitian case, the user must then specify a 1-dimensional real-valued 
search interval [A m j n , A max]- These two points are used to define a circular or ellipsoid 
contour C centered on the real axis, and along which the complex integration nodes 
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Hermitian 


O 

§ 





Non-Hermitian 



REAL 


Fig. 3: Various search contour examples for the Hermitian and the non-Hermitian 
FEAST algorithms. Both algorithms feature standard ellipsoid contour options and 
the possibility to define custom arbitrary shapes. In the Hermitian case, the contour 
is symmetric with the real axis and only the nodes in the upper-half may be generated. 
In the non-Hermitian case, a full contour is needed to enclose the wanted complex 
eigenvalues. 


are generated. The choice of a particular quadrature rule will lead to a different set 
of relative positions for the nodes and associated quadrature weights i.e. [zj. w 7 }. 
Since the eigenvalues are real, it is convenient to select a symmetric contour with the 
real axis (i.e. C = C*) since it only requires one to operate the quadrature on the 
half-contour (e.g. upper half). 

With a non-Hermitian problem, it is necessary to specify a 2-dinrensional search 
contour that surrounds the wanted complex eigenvalues. Circular or ellipsoid contours 
can also be used and they can be generated using standard options included into 
FEAST v3.0. These are defined by a complex midpoint A m id and a radius r for a 
circle (for an ellipse the ratio between the horizontal axis and vertical axis diameter 
can also be specified, as well as an angle of rotation). However, in some applications 
where the eigenvalues of interest belong to a particular subset in the complex plane, 
more flexibility for selecting a search contour with arbitrary shape could be needed. 
This option also lends itself to parallelism, where a large number of eigenvalues can 
be calculated by partitioning the complex plane into multiple contours (see Section 
















6 


J. KESTYN, E. POLIZZI, AND P. TANG 


4). Consequently, a “Custom Contour” feature is also supported in FEAST v3.0 that 
allows to account for arbitrary quadrature nodes and weights. 

2.2. Right/Left Spectral Projectors and Dual Subspaces. The filtering 
function can be applied to any similarity transformation of the pencil, the most general 
of which is the Jordan Normal Form. 


( 2 . 1 ) 


p(B~ l A) = Xp(J)X~ l . 


When applied to each Jordan block J&, the expression of the operator becomes [19]: 


( 2 . 2 ) 


p(Jk) = P 


/ 

'Afc 1 ... O' 

\ 


0 A*, ' • : 



: 1 


V 

i 

O 

o 

_1 

/ 


p(Afc) 


p'(Afc) 

0 ! 


0 p(Afc) 


P (m) ( AO 

(to — 1)! 


P'( A fc ) 

0 ! 


0 ... 0 p(Afc) 


Using the Cauchy integral formula (1.2), the diagonal elements of Jfc take the values 
one or zero, while all derivatives (i.e. off-diagonal elements) are zero. In practice, 
this may not be guarantee with FEAST as the filter is approximated by the rational 
function (1.4). A generalization of the algorithm for addressing the defective systems 
would require further studies, and our current FEAST non-Hermitian algorithm as¬ 
sumes that the Jordan form reduces to an eigenvalue decomposition. Consequently, 
we consider: 


(2.3) piB-'A) = Xp(A)X~ 1 = Xp(A)X H B, 


where the left and right eigensubspaces satisfy the R-bi-orthonormal relationship i.e. 
X H BX = I. For the case of the Hermitian problem, we note that X = X and the 
relation (1.1) can then be recovered. It is also important to mention the particular 
case of complex symmetric systems (i.e. A = A T and B = B T ) which leads to 
X = X*. In general, however, the left and right vectors are not straightforwardly 
related and they must be calculated explicitly. 

From (1.2), (1.3), and (2.3), one can define the right spectral projector p(B~ 1 A) 
for the right eigenvector subspace X m (i.e. p(B~ 1 A)X m = X m ) as follow: 

(2.4) p(B- x A) = ^ jf dz(zB - A)~ l B = X m X*B. 

For the treatment of the left eigenvector subspace solution of X H A = A X H B, it is 
first convenient to define the following eigenvalue decomposition: 

(2.5) piAB- 1 ) = X~ H p(A)X H = BXp(A)X H . 

One can then construct the left spectral projector p{AB~ 1 ) (i.e. X^ = X^p(AB~ 1 )) 
as: 

piAB- 1 ) = -Lj dzB(zB - A )" 1 = BX m Xr«, 


(2.6) 
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In FEAST, the projectors are formulated using the rational function p a (1.4) along 
with the quadrature nodes and weights {zj, ^>j}i<j<n e that approximate the contour 
integrations in (2.4) and (2.6). The right and left subspaces Q mo and Q mo are then 
obtained by applying the right and left projectors onto a set of mo vectors i.e. 

n e 

(2.7) Qm 0 = P a{B- 1 A)Y mo - A)~ 1 BY mo = Xp a (A)X H BY mo . 

i=i 

and 

n e 

(2.8) Q£ 0 = Y* o p a (AB~ l ) = 2 ujY^BizjB - A)" 1 = Y” o BXp a {k)X H . 

i=i 

In practice, the calculation of both subspaces require solving a series of linear systems. 
For the right subspace, 

n e 

(2.9) Q mo = ywith Q$ 0 solution of (ZjB - A)Q^ o = BY mo , 

i=i 

which was already outlined in (1.6), and for the left subspace: 

n e 

(2.10) Q mo = with Q$ 0 solution of (zjB - A) H Q$ o = B H Y mo . 

j =i 


These numerical operations are also described in Figure 2. As a result of (2.7) and 
(2.8), Qm 0 (resp. Q mo ) is formed by a linear combinations of the columns of X mo 
(resp. X mo ). The Rayleigh-Ritz procedure should then involve the reduced matrices 
Aq and Bq formed by projecting on the right with a subspace containing the right 
eigenvectors Q mo , and projecting on the left with a subspace containing the left 
eigenvectors Qm 0 - The resulting non-Hermitian reduced system can be solved using 
the QZ algorithm [27] in LAPACK [1] to yield the right and left eigenvectors W and 
W defined in Figure 2. The long right (resp. left) Ritz vectors can then be recovered 
as Y mo = Q mo W (resp. Y mo = Q mo W), and used as initial guess subspaces for the 
next FEAST iterations until convergence. 

2.3. Discussions on Convergence. In our implementation of FEAST, the cri¬ 
teria of convergence is satisfied if the norm of the relative residual associated with the 
eigenpairs (a:,, Aj) and (xj, A*), is found below an arbitrary threshold e i.e. 


( 2 . 11 ) 


resi = max 


II Axj - XjBxiW-j || A H Xj - \*B H Xj\\i \ 
||a.Ba;i||i ’ \\aB H Xi\\i j <€ 


where the value of e can be chosen typically equal to 10 -13 if high accuracy is needed 
using double precision arithmetic. The parameter a is relative to the eigenvalue 
range in the search contour. The latter is defined differently for the Hermitian and 
non-Hermitian cases (as discussed in Section 2.1), and a non-zero value for a can be 
chosen as a = max(|A m ,; n |, |A ma x|) for the Hermitian case and a = (|A m id| +r) for the 
non-Hermitian case. 

As discussed in the introduction section, the right/left eigenvectors associated 
with A i with i = 1 ,... ,m (and hence all associated residuals resi) are expected to 
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converge linearly along the FEAST subspace iterations at the rate: \p a (A mo+ i)/ p a (K) \ 
for i = 1,... ,m. The convergence depends then on both the subspace size mo (mo > 
m) and the accuracy of the rational filter p a (1.4) that should ideally provide values 
very close to unity for eigenvalues on the interior of the search contour and zero 
elsewhere. Although quite effective, the Gauss-quadrature approach along a circular 
contour that was proposed in the original FEAST article [29], is clearly not the only 
possible choice for optimizing the convegence ratio. Three other options have already 
been considered for the Hermitian problem including [16]: (i) the Trapezoidal rule; 
(ii) different contour shapes beside a circle such as a finely tuned flat ellipse; (iii) a 
new approximation of the spectral projector based on a Zolotarev approximant to 
the sign function which, after transformations, provides complex poles on the unit 
circle [42, 16]. Both Gauss and Zolotarev are well-suited choices for the Hermitian 
problem since they favor an accentuation of the decay of \p a \ at the boundaries of the 
interval along the real axis. The Trapezoidal rule, in turn, leads to a more uniform 
decay for |p a | in any directions of the complex plane [37], and it is also well-known for 
its exponential convergence property with the number of integration nodes n e [40]. 
The Trapezoidal rule is then expected to provide more consistency for capturing the 
complex eigenvalues of the non-Hermitian problem. 

Similarly to the Hermitian problem, the non-Hermitian FEAST algorithm also 
requires as input a search subspace of size mo chosen not smaller than the number of 
eigenvalues m within a given complex contour. If mo (mo > to) is chosen too small, the 
ratio governing the convergence may come closer to one, leading to slow convergence. 
Alternatively, if mo is chosen too large, it may result in an unnecessary high number 
of right-hand-sides when solving the shifted linear systems in (2.9) and (2.10). Two 
examples have been designed to illustrate the convergence rates dependence on too- 
These tests use the QC324 matrix from the NEP collection [2]. A contour has been 
created with a single eigenvalue Ai inside. The contour and few closest eigenvalues 
can be seen in Figure 4. The rational function \p a \ which has been generated using 
a six-point Trapezoidal rule is also shown in the figure (as a contour plot on the left 
and a 3-D surface plot on the right). Figure 5 shows the convergence of the relative 
residual norms for all the mo eigenvalues along the FEAST subspace iterations in 
the cases mo = 2 (left plot) and mo = 4 (right plot). For mo = 2, A 3 is the closest 
eigenvalue outside of the search subspace and controls the convergence rate. Since 
this eigenvalue is relatively close to the contour, FEAST exhibits slow convergence. 
The case mo = 4, in turn, leads to drastic improvement in the convergence rate which 
benefits from the small values of p a ( A5). 

A typical recommended choice for the search subspace size is m 0 = 2m. In 
practice, however, the exact number of eigenvalues to is unknown beforehand and the 
user must make an educated guess. Alternatively, to can also be estimated using, for 
example, the fast stochastic estimate procedure [ 8 ] that has been recently introduced 
in FEAST v3.0. It is important to note that in some situations slow convergence 
can result if the value of mo is only large enough to include the external eigenvalues 
bordering the contour. This problem can arise when the eigenvalues of interest are 
near a continuum or cluster of eigenvalues. With many eigenvalues closely bordering 
the contour, it may not be possible to improve convergence by increasing the subspace 
size mo- In this case, using additional integration nodes to increase the accuracy of 
p a may be necessary. A utility routine for calculating the rational function has also 
been included in FEAST v3.0 and can be used to investigate convergence for different 
contours and eigenvalue distributions. 
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Fig. 4: Value of the rational function plotted as contour plot (left) and surface plot 
(right) using a hexagonal contour for the QC324 matrix. The left plot includes the 
positions of the four closest eigenvalues. Only a single eigenvalue Ai is inside of the 
contour. More particularly, we note that |p a (Ai)| = 1.0000004 , |p a (A 2 )| = 1.7272309, 
|p a (A 3 )| = 0.4206553, |p a (A 4 )| = 3.6296209 x 1 CT 2 , and |p Q ,(A 5 )| = 6.9332547 x 1 CT 3 . 
The latter is associated with the eigenvalue A 5 which cannot be seen in the Figure 
since it is out of range. 


m 0 =2 
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Fig. 5: Convergence of the residual norms (2.11) associated with eigenvalues A,; in 
Figure 4. Two search subspace size are considered too = 2 (left plot) and too = 
4 (right plot). The dashed lines represent the theoretical linear convergence rate 
| pa (Am 0 +i)/ Pa (A?)j which is perfectly matched by the values returned by FEAST. We 
note that the convergence of the wanted eigenvalue Ai is is considerably slower using 
the smaller size subspace mo = 2 since the eigenvalue A 3 , that governs the convergence 
rate for this case, ends up being too close to the search contour. 
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Finally, and in contrast to the Hermitian problem where the contour nodes can 
be placed away from the eigenvalues (i.e. far enough from the real axis), a contour 
node could end up being located in the vicinity of a complex eigenvalue. In this 
case the rational function could take on values larger than one, and it becomes then 
possible for an eigenvalue outside of the contour to converge at a faster rate than the 
wanted eigenvalues inside. This is what is happening to A 2 in Figures 4 and 5. If a 
contour node is located too close to an eigenvalue, however, it is likely to worsen the 
conditioning of the corresponding shifted linear system in (2.9) and (2.10), making 
then the problem more challenging to solve using an iterative method. 

2.4. Reduced Contour Integration Cost. Non-Hermitian matrices A and B 
come in three flavors: (i) complex general, (ii) real non-symmetric, and (iii) complex 
symmetric. The major computational task performed by FEAST is the numerical 
integration, where a set of linear systems must be solved along a complex contour. 
In the complex general case both Q mo and Q mo are computed explicitly by solving 
the 2 n e (independent) linear systems defined in (2.9) and (2.10). It is important to 
note that most modern numerical libraries that includes direct methods for solving 
linear systems, supply a “transpose conjugate solve” feature as well (i.e. a linear 
system A H x = f can be solved using the factorization of A). Consequently, once 
the (ZjB — A) matrices are factorized in (2.9), the system solves in (2.10) can be 
performed without re-factorizing the conjugate transpose of the matrices. Similarly 
using iterative methods, the conjugate transpose solve could be performed without 
factorizing twice the preconditioner. If such option is available, the contour integra¬ 
tion in the most general case should involve only n e (independent) factorizations and 
2 n e (independent) solves with m 0 right hand sides. For the cases (ii) and (iii) above, 
it is possible to take advantage of some additional matrix properties that result in a 
reduced workload as discussed in following. 

Complex symmetric - For the complex symmetric case (A = A T and B = B T ), 
there exists a relationship between the left and right eigenvectors, which can 
be expressed as conjugate pairs i.e. X = X*. This allows the left subspace 
<2 to be expressed in terms of the right Q using the same simple relationship 
Q = Q*. Therefore Q (2.10) does not need to be calculated, and only the n e 
factorizations and n e solves in (2.9) are then necessary. 

Real non-symmetric - In general the treatment of the real non-symmetric case 
(.A = A* and B = B *) is identical to the complex non-symmetric one. 
However, there exists some savings for specific contours exhibiting symmetry 
across the real axis (i.e C = C*). For this particular case, each integration 
node Zj with j = 1 ,... ,n e /2 in the upper half of the complex plane has a 
conjugate pair z* in the lower half. From the resulting following relationships: 

(zjB - A)* = ( z*B - A) and { Zj B - A) H = ( z*B - A ) T , 

one can show that only the n e /2 factorizations of (ZjB — A) in the upper-half 
contour, along with n e total solves, are needed to obtain both Q mo and Qm 0 
in (2.9) and (2.10). 

The contour integration cost can then be reduced depending on the properties 
of the eigenvalue system, attributes of the complex contour (e.g. if C = C*), or the 
standard feature of transpose conjugate solve offered by most linear system solvers. 
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Table 1 summarizes the number of factorizations and solves effectively needed to per¬ 
form the full contour integration using a total of n e nodes. The cost of the Hermitian 
FEAST algorithm is also provided for reference. 


Family of eigenvalue problems 

( A,B ) properties 

^Factorizations 

#Solves 

Complex general 

N/A 

n e 

2 Tie 

Complex symmetric 

a = a t ,b = b t 

n e 

Tie 

Complex Hermitian with C = C* 

A = A h ,B hpd 

n e /2 

n e 

Real non-symmetric 

N/A 

Tie 

2?7-e 

Real non-symmetric with C = C* 

N/A 

n e /2 

n e 

Real symmetric with C = C* 

A = A T , B spd 

rie/2 

n e /2 


Table 1: Summary of the total number of factorizations and solves effectively needed 
by FEAST to perform the full contour integration using a total of n e nodes. It is also 
assumed that the transpose conjugate solve feature is available for the system solver. 


2.5. Resizing the search subspace. The rank of the subspaces Q mo (2-7) and 
Qm 0 ( 2 - 8 ) is greater than or equal to the number of wanted eigenvalues m (mo > m) 
since the eigenpairs outside of the contour are also accounted for due to inaccuracies 
in the numerical integration. In turn, if too is too severely overestimated the rank 
may end up being less than the subspace size too in finite precision arithmetic. Con¬ 
sequently, Too must then be resized to prevent the subspaces to become numerically 
rank deficient and the reduced matrix Bq = Qm 0 BQ mo singular. Otherwise, the QZ 
algorithm used in the computation of the reduced system can produce infinite eigen¬ 
value solutions [27]. Re-injecting these solutions into the subspace iteration would 
cause problems for the algorithm. The upper bound for the choice of mo should 
be the largest value before the subspaces become numerically rank deficient. One 
possible way to determine this threshold value consists of performing the spectral 
decomposition of Bq and analyzing its eigenvalues. It comes: 

(2.12) Bq = VVV H , 

where F is the diagonal matrix for the eigenvalues { 7 i}i=i,...,m 0 > and V and V are 
respectively the corresponding left and right bi-orthonormal eigenvector subspaces 
(i.e. V H V = Im 0 )- We note that for the Hermitian case where B and hence Bq must 
be positive definite, this step is replaced by monitoring the failure of the Cholesky 
factorization of Bq that could return a negative pivot. The position of the latter 
helped determining the threshold value for too used to resize the subspace accordingly. 
For the non-Hermitian problem, the matrix Bq is singular if there exists an eigenvalue 
equal to zero. In finite precision arithmetic, a zero eigenvalue must be characterized 
relatively i.e. 

(2-13) |7*l < t?*max{| 7 i|,...,| 7 mo |}, 

where r] is relative to the machine precision; e.g. ICR 16 in double precision. If an 
eigenvalue 7 j is then different than the maximum eigenvalue by 16 orders-of-magnitude 
then it is out of range for the double precision arithmetic and is counted as a zero. 
The subspace Too is resized to too such that Bq has no eigenvalues satisfying (2.13). 
The spectral decomposition of Bq is computed at each FEAST iteration, and as it 
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will be discussed in the next section, the resizing is performed in conjunction with a 
B-bi-orthonormalization for the subspaces Qm 0 and Qm 0 . The additional numerical 
cost of diagonalizing Bq is on the order of (but less expensive than) the cost associated 
with the diagonalization of the reduced generalized system. 

As a side remark, it interesting to note that using the expression (2.7) and (2.8), 
Bq can also be written as: 

(2-14) Bq = ( Y» o BX)pl(A)(X H BY mo ). 

Starting from the second FEAST iteration where the Ritz vectors Y mo and Y mo are not 
only span respectively by the true eigenvector subspaces X andA' but they also satisfy 
the property of B-bi-orthonormality (i.e. Y^ o BY mo = I since W H BqW = I in Figure 
2), it is possible to directly identify (2.14) with (2.12). It comes that V = Y^ q BX, 
V H = X H BY mo , and T = p 2 ( A). The latter indicates that the eigenvalues of Bq are 
related to the rational function p a , and can then be used to estimate the convergence 
rate [38]. In order for |p a | to satisfy (2.13), however, r] should be replaced by yYj. 
Consequently, the convergence rate for the algorithm is here limited to 10 ~ 8 in double 
precision arithmetic (a similar argument could be made for the case of the Hermitian 
FEAST which relies on the Cholesky decomposition of the normal-type equation Bq). 
FEAST can then converge in a minimum of 2 iterations to machine precision at 
~ 10 -16 given a sufficiently large enough subspace size too (whose value is also relative 
to the accuracy of p a )■ If needed, it may be possible to obtain higher convergence 
rate (i.e one FEAST iteration) using a direct robust QR factorization or singular value 
decomposition (SVD) of the subspaces Qm 0 and Qm 0 - 

2.6. B-bi-orthonormalization. The intended result of FEAST is a set of B - 
bi-orthonormal vectors. However, the B-bi-orthogonality is not guaranteed after the 
contour integration due to numerical inaccuracies. This is especially pronounced in 
large problems which exhibit a continuum of eigenvalues bordering the search contour. 
The contour integration could potentially include a large number of mixed states 
from the continuum in the subspaces Q mo (2-7) and Q mo ( 2 . 8 ). In our numerical 
experiments, we have found that an explicit B-bi-orthonormalization of the FEAST 
subspaces Q mo and Qm 0 helps improving the stability of the algorithm. Rather than 
performing a QR factorization or SVD of the subspaces, we aim at taking advantage of 
the eigen-decomposition of Bq (2.12) that is already performed in FEAST as discussed 
in the previous section. From (2.12) and since Bq = Qm 0 BQm 0 , it comes: 

(2.15) r = V H B Q V = (V H Q» o )B(Q mo V) = (Q mo V) H B(Q mo V). 

As a result, B-bi-orthonormal subspaces U mo and U mo can be generated by updating 
the current subspaces Q mo and Q mo as follows: 

(2.16) U mo = Q mo VT- 1/2 , U mo = Qm 0 VT- H ' 2 . 

As discussed in the previous section, the subspace size too may have already been 
reduced to too at this stage by allowing the eigenvectors in V and V, corresponding 
to the zero eigenvalues in T, to be removed from the subspace. In practice, a subset 
of V and V composed of too columns vectors can be easily extracted if the eigenpairs 
{ 7 i,Vi = Vei,di = Veiji — are fn'st sorted by decreasing values of | 7 ,|. Denoting 
F moX * 0 and V moX m 0 th e subsets of the new V and V subspaces restricted to their 
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first too columns, and r^ oX ^ 0 the matrix of the first mo sorted eigenvalues, (2.16) 
becomes: 


(2.17) 


Umn — Qmob 


- 1/2 


m 0 'mjXmo 1 m 0 Xmo’ 


Umn — QmoK 


2 


m o v moXmo x moXmo' 


Thereafter, the matrices of the reduced system can be obtained using a new 
Rayleigh-Ritz projection for A and B i.e. By = US o BUm 0 and Ay = U^AUmo- 
In spite of our R-bi-orthonormalization procedure, the resulting Bit is not neces¬ 
sarily identity, or even diagonal, due to numerical inaccuracies and finite precision 
arithmetic. However, this procedure is beneficial as a precursor to the QZ algorithm 
used to solve the reduced generalized problem, since it helps to remove contaminating 
eigenvalues that lie close to the contour. The benefits of our R-bi-ortlionormalization 
step can be seen in Figure 6. This test has been run on the CSH4 matrix [7], an 
801 x 801 complex scaled Hamiltonian from the BigDFT electronic structure code 
[14]. The eigenspectrum and the desired eigenvalues inside of a FEAST custom con¬ 
tour can be seen on the left side of Figure 6. One edge of the contour is parallel to the 
eigenvalue continuum. This results in a large number of mixed states after spectral 
projections in (2.7) and (2.8). Without bi-orthonormalization, the QZ algorithm fails 
to return a R-bi-orthogonal set of eigenvectors for large values of mo- The minimum 
obtained convergence then degrades for larger subspace sizes. By employing our bi- 
orthogonalization procedure the QZ algorithm is more stable and is able to return 
a R-bi-orthogonal set. The minimum obtained convergence remains constant for all 
too values as shown in Figure 6 (right plot). Note that the Bq matrix remains non¬ 
singular for all values of mo and no resizing operations have then been performed (i.e. 
Bu = Bq). 


CSH4 Eigenvalue Spectrum B-bi-orthonormalization procedure 



Fig. 6: On the left: eigenvalue spectrum of CSH4. On the right: minimum obtained 
convergence of the residual norm (2.11) after 20 FEAST iterations plotted in function 
of the subspace size mo- With our bi-orthonormalization procedure, the minimum 
obtained convergence stays relatively constant for all mo- 
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2.7. Spurious Solutions. In certain situation incorrect eigenvalues, so called 
spurious solutions, appear inside of the FEAST contour. These spurious eigenvalues 
do not converge. It is important to note that the corresponding spurious eigenvectors 
do not need to be explicitly removed from the search subspace to guarantee that the 
true solutions will converge along the FEAST iterations. Spurious solutions could then 
be flagged a posteriori once FEAST has converged. The spurious problem, however, 
leads to the practical issue of devising a suitable convergence test. 

In FEAST v2.1 for the Hermitian case using Gauss quadrature along a circle 
contour, the true number of eigenvalues m could be obtained by counting the eigen¬ 
value of Bq (see (2.12) using V = V) satisfying the condition \^i\ <1/4 [38, 12] 
(i.e. |p 0 (Aj)| < 1/2 from (2.14)) which guaranteed that A i is a true eigenvalue within 
[Amin, Amorr]- Since FEAST v3.0 is allowing for custom contour in the complex plane, 
it is not possible to perform a similar test by simply analyzing the values | 7 ;|. A 
new strategy has been developed, which can be used to provide increasingly better 
estimate of the number of true eigenvalue solutions in the search subspace at each the 
FEAST subsequent iteration. 

By definition, if a Ritz eigenpair (A i, yt, y t ) obtained after solving the reduced 
system is a genuine solution of the matrix pencil (A,B), then the eigenpair (p(Ai), 
yi, yt) is also a solution for p(B~ 1 A) (2.3) and p{AB~ 1 ) (2.5). In practice, one can 
perform a comparison between a direct calculation of p{\i) where A,; is the Ritz value, 
and the value p(\i) solution of p(B~ 1 A)yi cs p{K)yi (which is only approximated 
if the Ritz vectors have not yet converged). A suitable choice for the function p 
should allow these two values for p{\i) to differ significantly if A; is spurious, with the 
condition that yf 1 Bp{B~ l A)yi ~ p(\i) can also be easily calculated. The choice of 
the approximate spectral projector p „ (1.4) satisfy both conditions. Using (2.7) and 
(2.14), we note that: 


(2.18) 


Pa(Aj) — Vi BPaiB -1 A) V i = y?BXpl{K)X H By x = [B Q ] 


where [Bq\u denotes the i th diagonal element of Bq. Our identification procedure for 
the spurious solutions can then be summarized by the following three steps: 

1. Compute the corresponding {p a (^i)}i=i,...,m. 0 using (1.4) and the Ritz values 
solution of the reduced system {Ai}i = i l ... i m 0 ; he. 

n e 

(2.19) p a {\0 = £-^r-, 

Zj Ai 

7 = 1 J 


2. Form the Ritz vectors and wait for the contour integration to be performed 
and Bq constructed at the next FEAST iteration. 

3. Compare the calculated values of p a {Xi) with the corresponding diagonal 
values of [Bq]h (which are already sorted), and label Aas spurious if it 
satisfies the following inequality: 


( 2 . 20 ) 


pg(Ai) ~ [Bh 

Pl( AO 


> P, 


where p, is empirically chosen to be 10 -1 . We have found that this criteria 
is both large enough to flag all the spurious solutions, and small enough to 
ensure that true solutions are not mislabeled as soon as they start converging. 
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Once a Ritz eigenpair is flagged as spurious, it is kept in the search subspace but it 
is not accounted for in the test for the residual convergence (2.11). On exit, how¬ 
ever, a sorting procedure on the subspace is used by FEAST to return the converged 
eigenpairs free from spurious solutions. 

2.8. Summary and Complete Algorithm. The algorithm in Figure 7 pro¬ 
vides a complete description of non-Hermitian FEAST. The algorithm presents six 
stages from initialization to convergence test, that further detail the different nu¬ 
merical operations outlined in Figure 2. If the non-Hermitian eigenvalue problem 
is non-defective, FEAST is expected to converge and return the wanted eigenvalues 
associated with the R-bi-orthonormal right and left eigenvector subspaces. The con¬ 
vergence rate that was discussed in Section 2.3 depends on the quality of the filter to 
approximate spectral projector, and the size of the search subspace (hence it depends 
on the number of the contour points n e , and subspace size m o). Some of the current 
limitations of the algorithm are outlined in the following: 

Ill-conditioned linear systems - In contrast to Hermitian FEAST which allows 
the selection of complex shifts (contour points) that are not located on the real 
axis, some of these shifts could potentially come close to a complex eigenvalue 
using non-Hermitian FEAST. Similar to a traditional (Hermitian or non- 
Hermitian) Arnoldi algorithm using shift-and-invert strategy, the resulting 
linear systems may become ill-conditioned. If the shift happens to be at 
the exact position of the eigenvalue, the linear system will also be singular. 
One practical solution of this problem consists of moving the contour nodes 
appropriately and automatically by analyzing the eigenspectrum on-the-fly. 
Defective system - Currently if the system is defective, the QZ algorithm used to 
solve the reduced system in Step-4b of Figure 7 would not produce a set of 
R-bi-orthogonal subspaces. In practice, the algorithm may still be found to 
converge (without Step-2), but further studies are required to analyze the 
action of the approximate spectral projector on the Jordan form (2.1) and 
( 2 . 2 ). 

Ill-conditioned eigenvalue problem - Non-Hermitian systems are sensitive to the 
conditioning of the eigenvalues [15]. A well-known case is the real non- 
symmetric Grcar matrix [39, 9] (e.g. with n = 100), which gives rise to 
extremely sensitive eigenvalues. It appears some noticeable differences in the 
eigenvalue calculated using LAPACK-MATLAB, while comparing between 
the eigenvalue solutions of the matrix and its transpose. If double precision 
arithmetic is desired, this problem would require to perform the numerical 
operations in quad-precision [24]. Interestingly, when FEAST operates on the 
Grcar matrix or its transpose, the problem of sensitivity of the eigenvalues 
is not observed in any selected regions of the complex plane. For this ma¬ 
trix case, the projected reduced eigenvalue problem is then likely to be better 
conditioned than the original one. On the other hand, we have found that en¬ 
forcing the condition of bi-ortliogonality could affect the FEAST convergence 
for some other systems e.g. see the case of the QC2534 matrix discussed in 
Ref. [37]. Further studies are clearly needed to better understand the effect 
of ill-conditioned eigenvalue systems on FEAST. 

3. FEAST Eigensolver v3.0 Outlook. The FEAST numerical library pack¬ 
age [10] has originally been developed to address the Hermitian eigenvalue problem. 
The package was first released (under free BSD license) in Sep. 2009 (vl.0), followed 
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Solving: AXm = BX m A m and A H Xm = B H A m A^ with [A m ]»i G C 

Inputs: A and B general matrices in C nXn ; Search subspace size mo > m; 

Search contour nodes/weights { 27 ,..., z Ue }, {w i,..., u; ne } 

0- Initialization 

O.a Choose mo independent vectors Ym 0 = {yi, - • • ,ym 0 }nxm 0 (random or initial 
guess) 

O.b Choose mo independent vectors Ym 0 = {i/i, • • • i5m 0 }nxm 0 (random or initial 
guess) 

1- Contour Integration (optimization schemes detailed in Section 2.4) 


l.a Solve: ( ZjB - A)Q ( J l ' > 0 = BY mo 
l.b Solve: (ZjB - A) H Q $ 0 = B H Y mo 



For each pair {zj , ujj } 


2- Spurious Detection 

2.a Form the projected matrix Bq = Q^ Q BQm 0 

2.b Identify the number of spurious solutions m s starting from the second FEAST 


iteration (Section 2.7) 


3- Resize and B-bi-orthonormalization 

3.a Perform the spectral decomposition Bq = VVV H 

3.b Define new subspace dimension mo if needed (Section 2.5) 

3.c Extract the first mo columns of V, V and F sorted by descreasing values of | 7 ^| 

3. d Form R-bi-orthonormal subspaces U^ 0 and (Section 2.6) 

4- Rayleigh-Ritz Procedure 

4. a Form the matrices Btj = U~ BU^ n and Atj = U~ AU^^ 

u m o m 0 u m o m 0 

4.b Solve AuW = BjjWAu and Au H W = Bu H WX[j\ with W H B V W = I 

4.c Compute p a {^i) (1-4) for the Ritz values (A^ = [A[/]jj); To be used by Step-2. b 

4. d Compute Ritz vectors Ym 0 = f7m 0 W and Y^ 0 = t/m 0 W 

5- Convergence Test 

5. a Find the number of Ritz values m r located inside the search contour 

5.b Compute the residuals (2.11) of the corresponding m r eigenpairs 

5.c If convergence criteria is not reached for the m = (m r — m s ) lowest calculated 
residuals, begin next iteration at Step-1 with mo —> 

5.d Place the converged eigenpairs within the first m columns of Ym , h TO and A q , 
and exit 


Output: A m — Ym ; X m — ; X^BXm — ; A m — AQ m 


Fig. 7: FEAST Non-Hermitian general algorithm 


by upgrades in Mar. 2012 (v2.0), and Feb. 2013 (v2.1). The latter was adopted 
by Intel math kernel library (Intel-MKL). The current version of the FEAST pack¬ 
age (v3.0) released in Jun. 2015, started including all the various implementation of 
the non-Hermitian algorithm (real non-symmetric, complex symmetric, and complex 
general) on both shared-memory systems (i.e. FEAST-SMP version) and distributed 
architectures (i.e. FEAST-MPI version). FEAST’s implementation exploit a key 
strength of modern computer architectures, namely, multiple levels of parallelism. 
FEAST-MPI includes the three levels of parallelism: MPI for the search contour - 
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MPI for the distribution of the linear systems along the contour nodes - OpenMP for 
the system solver. 

All functionalities of FEAST are accessible through a set of standard predefined 
interfaces. The “ready-to-use” default drivers are capable to accept dense, banded, 
and sparse (CSR) matrix formats. For solving the shifted linear systems, the dense, 
banded, and sparse FEAST interfaces make use of LAPACK [1], SPIKE-SMP [26], 
and Pardiso [34] (MKL-version), respectively. For more advanced users, the FEAST 
library also includes features such as reverse communication interfaces (RCI) that are 
both matrix format and linear system solver independent. These RCI interfaces can 
then be customized by the end users to allow maximum flexibility for their applica¬ 
tions. In particular, the user is in control of the three major numerical computations 
to perform on matrices: (i) Factorize (ZjB — A) (and (ZjB — A) H if needed); (ii) 
Solve (ZjB — A)Q$ 0 = BY mo and (ZjB — A) H Qm\ = B H Y mo ; (iii) Mat-vec procedure 
involving the multiplications of matrices A , B , A H , B H with mo multiple vectors. 
In order to address very large sparse systems, customized routines such as itera¬ 
tive linear system solvers with or without preconditioners, or domain decomposition 
techniques, can straightforwardly be plugged into the RCI loop to perform these op¬ 
erations. Consequently, the software package has been very well received by the HPC 
and application developers, especially in the electronic structure and nanoelectronics 
communities (e.g. [6, 14, 30]). 

In addition to the non-Hernritian interfaces, various supporting routines have also 
been added in v3.0. These includes: (i) a fast stochastic estimator that can provide 
a reasonable guess of the number of eigenvalues count within a user-defined search 
contour [8]; and (ii) a routine that can assist the user to extract nodes and weights from 
a custom design arbitrary geometry in the complex plane. This is particularly helpful 
for non-Hermitian routines as it grants flexibility in targeting specific eigenvalues. 

4. Numerical Experiments. The non-Hermitian eigenvalue problem (NEP) 
collection [2] has been used for testing and development. Our test parameters and 
results for a set of selected system matrices are provided in Table 2. A subset of the 
eigenpairs has been targeted for each system matrix corresponding to the information 
provided in the NEP collection, if available. Only a few number of FEAST subspace 
iterations, is needed for most systems to reach convergence. 


Matrix 

n 

mo 

m 

mid 

r 

^Iteration 

BFW782 

782 

44 

22 

(-5300,300) 

10000.0 

2 

BWM200 

200 

36 

18 

(-1200,0.0) 

60.0 

2 

CDDE5 

961 

140 

70 

(4.75,0.0) 

0.25 

2 

GRCAR 

100 

38 

19 

(0.3,0.2) 

0.5 

4 

QC324 

324 

72 

37 

(0.0,0.0) 

0.04 

3 

RBS480 

480 

112 

56 

(0.0,0.5) 

0.5 

9 

RW136 

136 

38 

19 

(1.0,0.0) 

0.5 

5 

TOLS340 

340 

16 

8 

(-60,300) 

30.0 

3 

TOLS4000 

4000 

144 

72 

(-60,300) 

233.0 

8 


Table 2: Non-Hermitian test cases from NEP collection. The contour is chosen as a 
full circle defined by the center and radius (A m jd,r) using n e = 16 integration points, 
and the criteria of convergence for the residual is set at 10~ 12 . The system size n, the 
subspace size ?no, the final number of eigenvalues m found within the search contour, 
the final residual, and number of FEAST iterations to reach convergence are also 
listed. 
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4.1. Parallelism. As mentioned previously, a major advantage to FEAST are 
the multiple levels of parallelism naturally contained within the algorithm. The fol¬ 
lowing results were gathered on a shared memory machine with 8 10-core Intel Xeon 
E7-8870 processors. Each MPI process uses 5 cores. 

Multiple contours can be solved independently using the first level of parallelism 
of FEAST (overall orthogonality is also largely preserved [38, 11]). However, there 
is a threshold on the number of eigenvalues that can be calculated efficiently using 
a single FEAST contour. In practice mo should represent only a small percentage 
of the matrix size and it may not be suitable to go beyond few thousands because 
of the O(toq) complexity of the reduced system solve. If enough parallel resources 
are available, however, the solution for an arbitrary large number of eigenvalues can 
be obtained by partitioning the entire search domain into multiple contours. FEAST 
can then be applied to each in parallel with a reduced value for Too- An example of 
such partitioning is illustrated in Figure 8. The test uses the FEAST dense interfaces 




Fig. 8: A 4000 x 4000 dense matrix has been constructed such that all eigenvalues 
exist within the unit disk. Multiple FEAST contours have been used to calculate a 
subset of the eigenvalues in parallel. 


on a 4000 x 4000 dense matrix constructed such that all eigenvalues exist within the 
unit disk. Two sets of contours are considered: First, squares with 4 trapezoidal 
intervals along each line segment for a total of 16 linear systems to be solved; Next, 
circles defined by 16 integration nodes. In all cases the size of the search subspace 
is set at too = 200, and the criteria of convergence for the residual at 10 -12 . At 
first we consider using only one MPI process per contour, so the 16 linear systems are 
solved one after another using the LAPACK dense solver. Table 3 reports the number 
of eigenvalues found in each contour, the number of FEAST iterations, and the total 
simulation times. Two simulation times are given, the fastest has been obtained using 
a new option offered in FEAST v3.0 that allows to save and reuse the factorization 
at each iteration (increasing then the memory footprint by the number of integration 
nodes, but removing the need to perform this costly step multiple times). Saving 
the factorization between FEAST iterations produced a 2 — 3x speed improvement 
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for all contours. As it can be observed from the number of FEAST iterations and 
the simulation times in Table 3, load balancing becomes an issue with some contours 
taking more than twice the time of the fastest converging contour. Since FEAST runs 
in parallel, its overall efficiency depends on the slowest converging contour (i.e Square 
5 or Circle 3). 


Contour N° 

m 

# Iterations 

Time-1 (s) 

Time-2 (s) 

Square 





i 

84 

9 

556 

236 

2 

85 

7 

443 

197 

3 

95 

15 

891 

359 

4 

83 

12 

723 

299 

5 

73 

19 

1107 

438 

6 

69 

12 

718 

297 

Circle 





i 

120 

4 

277 

137 

2 

129 

8 

500 

217 

3 

137 

11 

666 

278 

4 

118 

8 

503 

218 

5 

109 

6 

389 

177 

6 

104 

4 

274 

137 


Table 3: Timing results, number of eigenvalue m and number of iterations obtained 
using FEAST for each contour in Figure 8, with mo = 200, n e = 16 and one single MPI 
process per contour. Two total times are reported by contour: Time-1 for FEAST 
normal use, and Time-2 that does not account for the cost of the multiple matrix 
factorizations along the FEAST iterations which are saved in memory. We note that 
the overall parallel FEAST efficiency is limited by the slowest individual performance 
on a single contour obtained here for either Square 5 or Circle 3. 


Better performances can be achieved by taking advantage of another level of 
parallelism for solving the set of independent linear systems. In the general case, as 
mentioned in Section 2.4, a single factorization and two solves must be performed at 
each integration node. With a total of n e factorizations and 2 n e solves, the simulation 
time could then potentially be reduced by a factor n e or more (since the linear systems 
do not need to be re-factorized at each iteration if n e is equal to the #MPI processes). 
Table 4 presents scalability results for the 4000 x 4000 dense matrix considered in Table 
3. For this small dense example, one observes only a maximum of ~ 11 x speed-up 
compared to a single process using 16 MPI processes. The relatively small size of the 
test matrix is a limiting factor since it leads to comparable times between solving a 
single linear system and the other numerical operations that take place in FEAST 
(e.g. inner product to form the reduced system, solution of reduced system, etc.). 
Better scalability performances could be expected using much larger sparse systems. 

Conclusion. The detailed work developing the non-Hermitian FEAST algorithm 
has been presented. This constitutes a generalization of the well established FEAST 
Hermitian algorithm, leading to a significant upgrade of the FEAST solver package. 
The major differences between the Hermitian and non-Hermitian FEAST algorithms 
stem from the complex eigenvalues, which require a two-dimensional search contour. 
Dual subspaces are necessary to allow for computation of a H-bi-orthogonal basis of 
left and right eigenvectors. In order to improve the stability of the algorithm, tech¬ 
niques of subspace resizing, B bi-ortlionormalization procedure and spurious detection 
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Contour N° 

1 

2 

3 

4 

5 

6 

Speed-up 

Square 

1 MPI 

556 

443 

891 

723 

1107 

718 

1.00 

2 MPI 

303 

231 

457 

370 

566 

368 

1.96 

4 MPI 

160 

121 

244 

198 

303 

196 

3.65 

8 MPI 

98 

88 

149 

132 

201 

128 

5.51 

16 MPI 

56 

41 

70 

62 

95 

69 

11.65 

Circle 

1 MPI 

277 

500 

666 

503 

389 

274 

1.00 

2 MPI 

147 

252 

338 

253 

196 

139 

1.97 

4 MPI 

81 

139 

187 

140 

108 

77 

3.56 

8 MPI 

49 

109 

148 

111 

85 

60 

4.50 

16 MPI 

28 

46 

60 

45 

38 

30 

11.10 


Table 4: MPI scalability results for the system matrix and contours considered in 
Figure 8 and Table 3. The first column indicated the cluster of MPI processes being 
used by each contour to distribute the linear systems. The last column indicates the 
speed-up performance associated with the slowest contour (Square 5 or Circle 3). 


have been implemented and successfully tested. We note that the convergence prop¬ 
erty and parallel capability associated with the traditional FEAST algorithm have 
been retained with the non-Hermitian algorithm. Finally, the detailed and complete 
non-Hermitian FEAST algorithm implemented in v3.0 is provided, and limitations of 
its applicability have also been discussed. 
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